The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
LiDAR-based 3D Object detectors have achieved impressive performances in many benchmarks, however, multisensors fusion-based techniques are promising to further improve the results. PointPainting, as a recently proposed framework, can add the semantic information from the 2D image into the 3D LiDAR point by the painting operation to boost the detection performance. However, due to the limited resolution of 2D feature maps, severe boundary-blurring effect happens during re-projection of 2D semantic segmentation into the 3D point clouds. To well handle this limitation, a general multimodal fusion framework MSF has been proposed to fuse the semantic information from both the 2D image and 3D points scene parsing results. Specifically, MSF includes three main modules. First, SOTA off-the-shelf 2D/3D semantic segmentation approaches are employed to generate the parsing results for 2D images and 3D point clouds. The 2D semantic information is further re-projected into the 3D point clouds with calibrated parameters. To handle the misalignment between the 2D and 3D parsing results, an AAF module is proposed to fuse them by learning an adaptive fusion score. Then the point cloud with the fused semantic label is sent to the following 3D object detectors. Furthermore, we propose a DFF module to aggregate deep features in different levels to boost the final detection performance. The effectiveness of the framework has been verified on two public large-scale 3D object detection benchmarks by comparing with different baselines. The experimental results show that the proposed fusion strategies can significantly improve the detection performance compared to the methods using only point clouds and the methods using only 2D semantic information. Most importantly, the proposed approach significantly outperforms other approaches and sets new SOTA results on the nuScenes testing benchmark.
translated by 谷歌翻译
The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers. These models are usually trained by maximizing the likelihood the output text sequence and assumes the input sequence and all gold preceding tokens are given during training, while during inference the model suffers from the exposure bias problem (i.e., it only has access to its previously predicted tokens rather gold tokens during beam search). In this paper, we propose MoCa ({\bf Mo}mentum {\bf Ca}libration) for text generation. MoCa is an online method that dynamically generates slowly evolving (but consistent) samples using a momentum moving average generator with beam search and MoCa learns to align its model scores of these samples with their actual qualities. Experiments on four text generation datasets (i.e., CNN/DailyMail, XSum, SAMSum and Gigaword) show MoCa consistently improves strong pre-trained transformers using vanilla fine-tuning and we achieve the state-of-the-art results on CNN/DailyMail and SAMSum datasets.
translated by 谷歌翻译
配备高速数字化器的前端电子设备正在使用并建议将来的核检测器。最近的文献表明,在处理来自核检测器的数字信号时,深度学习模型,尤其是一维卷积神经网络。模拟和实验证明了该领域神经网络的令人满意的准确性和其他好处。但是,仍需要研究特定的硬件加速在线操作。在这项工作中,我们介绍了Pulsedl-II,这是一种专门设计的,专门为事件功能(时间,能量等)从具有深度学习的脉冲中提取的应用。根据先前的版本,PULSEDL-II将RISC CPU纳入系统结构,以更好地功能灵活性和完整性。 SOC中的神经网络加速器采用三级(算术单元,处理元件,神经网络)层次结构,并促进数字设计的参数优化。此外,我们设计了一种量化方案和相关的实现方法(恢复和位移位),以在所选层类型的选定子集中与深度学习框架(例如Tensorflow)完全兼容。通过当前方案,支持神经网络的量化训练,并通过专用脚本自动将网络模型转换为RISC CPU软件,几乎没有准确性损失。我们在现场可编程门阵列(FPGA)上验证pulsedl-ii。最后,通过由直接数字合成(DDS)信号发生器和带有模数转换器(ADC)的FPGA开发板组成的实验设置进行系统验证。拟议的系统实现了60 PS的时间分辨率和0.40%的能量分辨率,在线神经网络推断在信号与噪声比(SNR)为47.4 dB时。
translated by 谷歌翻译
本文介绍了Z-Code ++,这是一种针对抽象文本摘要优化的新的预训练的语言模型。该模型使用三种技术扩展了艺术编码器模型的状态。首先,我们使用两阶段的预训练过程来改善模型在低资源摘要任务上的性能。该模型首先是使用文本语料库进行语言理解的预先培训的,然后在汇总语料库中不断预先培训,以进行基础文本生成。其次,我们用分离的注意力层代替编码器中的自我发项层,其中每个单词都使用两个向量分别代表其内容和位置。第三,我们使用融合编码器,这是一种以层次方式编码长序列的简单而有效的方法。 Z-Code ++在13个文本摘要任务中的9个跨5种语言中创建了新的艺术状态。我们的模型的参数有效,因为它的表现优于XSUM上600倍较大的Palm-540b,并且在Samsum上的易经的200倍GPT3-175B较大。在零射击和少量设置中,我们的模型大大优于竞争模型。
translated by 谷歌翻译
为了稳定地训练生成对抗网络(GAN),将实例噪声注入歧视器的输入中被认为是理论上的声音解决方案,但是,在实践中尚未实现其承诺。本文介绍了采用高斯混合物分布的扩散 - 在正向扩散链的所有扩散步骤中定义,以注入实例噪声。从观察到或生成的数据扩散的混合物中的随机样品被作为歧视器的输入。通过将其梯度通过前向扩散链进行反向传播来更新,该链的长度可自适应地调节以控制每个训练步骤允许的最大噪声与数据比率。理论分析验证了所提出的扩散gan的声音,该扩散器提供了模型和域 - 不可分割的可区分增强。在各种数据集上进行的一系列实验表明,扩散 - GAN可以提供稳定且具有数据效率的GAN训练,从而使对强GAN基准的性能保持一致,以综合构成照片现实的图像。
translated by 谷歌翻译
大型语言模型已被证明可以使用少量学习来实现各种自然语言任务的出色表现,这大大减少了将模型调整到特定应用程序所需的特定任务培训示例的数量。为了进一步了解量表对少量学习的影响,我们培训了一个5400亿个参数,密集激活的变压器语言模型,我们称之为“途径”语言模型棕榈。我们使用Pathways在6144 TPU V4芯片上训练了Palm,这是一种新的ML系统,可在多个TPU POD上进行高效的训练。我们通过在数百种语言理解和产生基准的基准方面实现最先进的学习结果来证明扩展的持续好处。在这些任务中,Palm 540B实现了突破性的表现,在一系列多步推理任务上表现出色,超过了最新的最新表现,并且在最近发布的Big Benchmark上表现优于平均人类表现。大量的大型基础任务显示出与模型量表的不连续改进,这意味着当我们扩展到最大模型时,性能急剧增加。 Palm在多语言任务和源代码生成方面也具有很强的功能,我们在各种基准测试中证明了这一点。我们还提供了有关偏见和毒性的全面分析,并研究了训练数据记忆的程度,相对于模型量表。最后,我们讨论与大语言模型有关的道德考虑,并讨论潜在的缓解策略。
translated by 谷歌翻译
已经提出了高效和自适应计算机视觉系统以使计算机视觉任务,例如图像分类和对象检测,针对嵌入或移动设备进行了优化。这些解决方案最近的起源,专注于通过设计具有近似旋钮的自适应系统来优化模型(深神经网络,DNN)或系统。尽管最近的几项努力,但我们表明现有解决方案遭受了两个主要缺点。首先,系统不考虑模型的能量消耗,同时在制定要运行的模型的决定时。其次,由于其他共同居民工作负载,评估不考虑设备上的争用的实际情况。在这项工作中,我们提出了一种高效和自适应的视频对象检测系统,这是联合优化的精度,能量效率和延迟。底层Virtuoso是一个多分支执行内核,它能够在精度 - 能量 - 延迟轴上的不同运行点处运行,以及轻量级运行时调度程序,以选择最佳的执行分支以满足用户要求。要与Virtuoso相当比较,我们基准于15件最先进的或广泛使用的协议,包括更快的R-CNN(FRCNN),YOLO V3,SSD,培训台,SELSA,MEGA,REPP,FastAdapt和我们的内部FRCNN +,YOLO +,SSD +和高效+(我们的变体具有增强的手机效率)的自适应变体。通过这种全面的基准,Virtuoso对所有上述协议显示出优势,在NVIDIA Jetson Mobile GPU上的每一项效率水平上引领精度边界。具体而言,Virtuoso的准确性为63.9%,比一些流行的物体检测模型高于10%,51.1%,yolo为49.5%。
translated by 谷歌翻译
今天的大部分AI系统都专注于使用自我关注机制和变压器架构在大量多样化的数据中实现令人印象深刻的性能收益。在本文中,我们建议使用外部注意机制增强变压器架构,以带来外部知识和背景。通过将外部信息集成到预测过程中,我们希望减少对更大的模型的需求,并增加AI系统的民主化。我们发现所提出的外部注意机制可以显着提高现有AI系统的性能,使从业者可以轻松地将基础AI模型自定义到许多不同的下游应用程序。特别是,我们专注于勤杂朗语推理的任务,展示所提出的外部注意机制可以增加现有的变压器模型,并显着提高模型的推理能力。拟议的系统,知识外部关注推理(Kear),达到了开放的铜商QA研究基准的人类奇偶校验,其准确性为89.4 \%,与人类准确性为88.9 \%。
translated by 谷歌翻译
大多数政策评估算法基于Bellman期望和最优性方程的理论,它导出了两个流行的方法 - 政策迭代(PI)和价值迭代(VI)。然而,由于多步骤禁止校正的大方差,多步引导往往是在基于PI的基于PI的方法的交叉目的和禁止策略学习。相比之下,基于VI的方法是自然的违规政策,但受到一步学习的影响。本文通过利用具有最优值函数的多步自举函数的潜在结构来推导新的多步贝尔曼最优性方程。通过这种新的等式,我们推出了一种新的多步值迭代方法,该方法将以指数收缩率$ \ mathcal {o}(\ gamma ^ n)$但仅线性计算复杂度收敛到最佳值函数。此外,它可以自然地推导出一套多步脱离策略算法,可以安全地利用任意策略收集的数据,无需校正。实验表明,所提出的方法是可靠的,易于实施和实现最先进的性能在一系列标准基准数据集上。
translated by 谷歌翻译